{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "whjsJasuhstV"
   },
   "source": [
    "<a href=\"https://colab.research.google.com/github/jeffheaton/app_generative_ai/blob/main/t81_559_class_03_2_text_gen.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "euOZxlIMhstX"
   },
   "source": [
    "# T81-559: Applications of Generative Artificial Intelligence\n",
    "**Module 3: Large Language Models**\n",
    "* Instructor: [Jeff Heaton](https://sites.wustl.edu/jeffheaton/), McKelvey School of Engineering, [Washington University in St. Louis](https://engineering.wustl.edu/Programs/Pages/default.aspx)\n",
    "* For more information visit the [class website](https://sites.wustl.edu/jeffheaton/t81-558/)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "d4Yov72PhstY"
   },
   "source": [
    "# Module 3 Material\n",
    "\n",
    "* Part 3.1: Foundation Models [[Video]](https://www.youtube.com/watch?v=Gb0tk5qq1fA) [[Notebook]](t81_559_class_03_1_llm.ipynb)\n",
    "* **Part 3.2: Text Generation** [[Video]](https://www.youtube.com/watch?v=lB97Lqt7q58) [[Notebook]](t81_559_class_03_2_text_gen.ipynb)\n",
    "* Part 3.3: Text Summarization [[Video]](https://www.youtube.com/watch?v=3MoIUXE2eEU) [[Notebook]](t81_559_class_03_3_text_summary.ipynb)\n",
    "* Part 3.4: Text Classification [[Video]](https://www.youtube.com/watch?v=2VpOwFIGmA8) [[Notebook]](t81_559_class_03_4_classification.ipynb)\n",
    "* Part 3.5: LLM Writes a Book [[Video]](https://www.youtube.com/watch?v=iU40Rttlb_Q) [[Notebook]](t81_559_class_03_5_book.ipynb)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "AcAUP0c3hstY"
   },
   "source": [
    "# Google CoLab Instructions\n",
    "\n",
    "The following code ensures that Google CoLab is running and maps Google Drive if needed."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "xsI496h5hstZ",
    "outputId": "36f9a20e-3f9e-43d0-8379-3504c206514e"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Note: using Google CoLab\n",
      "Requirement already satisfied: langchain in /usr/local/lib/python3.10/dist-packages (0.2.14)\n",
      "Requirement already satisfied: langchain_openai in /usr/local/lib/python3.10/dist-packages (0.1.22)\n",
      "Requirement already satisfied: PyYAML>=5.3 in /usr/local/lib/python3.10/dist-packages (from langchain) (6.0.2)\n",
      "Requirement already satisfied: SQLAlchemy<3,>=1.4 in /usr/local/lib/python3.10/dist-packages (from langchain) (2.0.32)\n",
      "Requirement already satisfied: aiohttp<4.0.0,>=3.8.3 in /usr/local/lib/python3.10/dist-packages (from langchain) (3.10.3)\n",
      "Requirement already satisfied: async-timeout<5.0.0,>=4.0.0 in /usr/local/lib/python3.10/dist-packages (from langchain) (4.0.3)\n",
      "Requirement already satisfied: langchain-core<0.3.0,>=0.2.32 in /usr/local/lib/python3.10/dist-packages (from langchain) (0.2.33)\n",
      "Requirement already satisfied: langchain-text-splitters<0.3.0,>=0.2.0 in /usr/local/lib/python3.10/dist-packages (from langchain) (0.2.2)\n",
      "Requirement already satisfied: langsmith<0.2.0,>=0.1.17 in /usr/local/lib/python3.10/dist-packages (from langchain) (0.1.99)\n",
      "Requirement already satisfied: numpy<2,>=1 in /usr/local/lib/python3.10/dist-packages (from langchain) (1.26.4)\n",
      "Requirement already satisfied: pydantic<3,>=1 in /usr/local/lib/python3.10/dist-packages (from langchain) (2.8.2)\n",
      "Requirement already satisfied: requests<3,>=2 in /usr/local/lib/python3.10/dist-packages (from langchain) (2.32.3)\n",
      "Requirement already satisfied: tenacity!=8.4.0,<9.0.0,>=8.1.0 in /usr/local/lib/python3.10/dist-packages (from langchain) (8.5.0)\n",
      "Requirement already satisfied: openai<2.0.0,>=1.40.0 in /usr/local/lib/python3.10/dist-packages (from langchain_openai) (1.41.1)\n",
      "Requirement already satisfied: tiktoken<1,>=0.7 in /usr/local/lib/python3.10/dist-packages (from langchain_openai) (0.7.0)\n",
      "Requirement already satisfied: aiohappyeyeballs>=2.3.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (2.3.5)\n",
      "Requirement already satisfied: aiosignal>=1.1.2 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (1.3.1)\n",
      "Requirement already satisfied: attrs>=17.3.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (24.2.0)\n",
      "Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (1.4.1)\n",
      "Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (6.0.5)\n",
      "Requirement already satisfied: yarl<2.0,>=1.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (1.9.4)\n",
      "Requirement already satisfied: jsonpatch<2.0,>=1.33 in /usr/local/lib/python3.10/dist-packages (from langchain-core<0.3.0,>=0.2.32->langchain) (1.33)\n",
      "Requirement already satisfied: packaging<25,>=23.2 in /usr/local/lib/python3.10/dist-packages (from langchain-core<0.3.0,>=0.2.32->langchain) (24.1)\n",
      "Requirement already satisfied: typing-extensions>=4.7 in /usr/local/lib/python3.10/dist-packages (from langchain-core<0.3.0,>=0.2.32->langchain) (4.12.2)\n",
      "Requirement already satisfied: orjson<4.0.0,>=3.9.14 in /usr/local/lib/python3.10/dist-packages (from langsmith<0.2.0,>=0.1.17->langchain) (3.10.7)\n",
      "Requirement already satisfied: anyio<5,>=3.5.0 in /usr/local/lib/python3.10/dist-packages (from openai<2.0.0,>=1.40.0->langchain_openai) (3.7.1)\n",
      "Requirement already satisfied: distro<2,>=1.7.0 in /usr/lib/python3/dist-packages (from openai<2.0.0,>=1.40.0->langchain_openai) (1.7.0)\n",
      "Requirement already satisfied: httpx<1,>=0.23.0 in /usr/local/lib/python3.10/dist-packages (from openai<2.0.0,>=1.40.0->langchain_openai) (0.27.0)\n",
      "Requirement already satisfied: jiter<1,>=0.4.0 in /usr/local/lib/python3.10/dist-packages (from openai<2.0.0,>=1.40.0->langchain_openai) (0.5.0)\n",
      "Requirement already satisfied: sniffio in /usr/local/lib/python3.10/dist-packages (from openai<2.0.0,>=1.40.0->langchain_openai) (1.3.1)\n",
      "Requirement already satisfied: tqdm>4 in /usr/local/lib/python3.10/dist-packages (from openai<2.0.0,>=1.40.0->langchain_openai) (4.66.5)\n",
      "Requirement already satisfied: annotated-types>=0.4.0 in /usr/local/lib/python3.10/dist-packages (from pydantic<3,>=1->langchain) (0.7.0)\n",
      "Requirement already satisfied: pydantic-core==2.20.1 in /usr/local/lib/python3.10/dist-packages (from pydantic<3,>=1->langchain) (2.20.1)\n",
      "Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests<3,>=2->langchain) (3.3.2)\n",
      "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests<3,>=2->langchain) (3.7)\n",
      "Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests<3,>=2->langchain) (2.0.7)\n",
      "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests<3,>=2->langchain) (2024.7.4)\n",
      "Requirement already satisfied: greenlet!=0.4.17 in /usr/local/lib/python3.10/dist-packages (from SQLAlchemy<3,>=1.4->langchain) (3.0.3)\n",
      "Requirement already satisfied: regex>=2022.1.18 in /usr/local/lib/python3.10/dist-packages (from tiktoken<1,>=0.7->langchain_openai) (2024.5.15)\n",
      "Requirement already satisfied: exceptiongroup in /usr/local/lib/python3.10/dist-packages (from anyio<5,>=3.5.0->openai<2.0.0,>=1.40.0->langchain_openai) (1.2.2)\n",
      "Requirement already satisfied: httpcore==1.* in /usr/local/lib/python3.10/dist-packages (from httpx<1,>=0.23.0->openai<2.0.0,>=1.40.0->langchain_openai) (1.0.5)\n",
      "Requirement already satisfied: h11<0.15,>=0.13 in /usr/local/lib/python3.10/dist-packages (from httpcore==1.*->httpx<1,>=0.23.0->openai<2.0.0,>=1.40.0->langchain_openai) (0.14.0)\n",
      "Requirement already satisfied: jsonpointer>=1.9 in /usr/local/lib/python3.10/dist-packages (from jsonpatch<2.0,>=1.33->langchain-core<0.3.0,>=0.2.32->langchain) (3.0.0)\n"
     ]
    }
   ],
   "source": [
    "import os\n",
    "\n",
    "try:\n",
    "    from google.colab import drive, userdata\n",
    "    COLAB = True\n",
    "    print(\"Note: using Google CoLab\")\n",
    "except:\n",
    "    print(\"Note: not using Google CoLab\")\n",
    "    COLAB = False\n",
    "\n",
    "# OpenAI Secrets\n",
    "if COLAB:\n",
    "    os.environ[\"OPENAI_API_KEY\"] = userdata.get('OPENAI_API_KEY')\n",
    "\n",
    "# Install needed libraries in CoLab\n",
    "if COLAB:\n",
    "    !pip install langchain langchain_openai"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "pC9A-LaYhsta"
   },
   "source": [
    "# 3.2: Text Generation\n",
    "\n",
    "Text generation is one of the most common tasks for LLMs. We've already seen how to use the LLM to generate code; generating regular text for human consumption is similar. To generate text, we will not use a conversational chat style; instead, we will send prompts to LangChain and receive the generated text.\n",
    "\n",
    "We use the following code to query the LLM for text generation.\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "TMF-rtxgRAea"
   },
   "outputs": [],
   "source": [
    "from langchain_core.messages import HumanMessage, SystemMessage\n",
    "from langchain_core.prompts.chat import (\n",
    "    ChatPromptTemplate,\n",
    "    HumanMessagePromptTemplate,\n",
    "    SystemMessagePromptTemplate,\n",
    ")\n",
    "from langchain_openai import ChatOpenAI\n",
    "from IPython.display import display_markdown\n",
    "\n",
    "MODEL = 'gpt-4o-mini'\n",
    "TEMPERATURE = 0.2\n",
    "\n",
    "def get_response(llm, prompt):\n",
    "  messages = [\n",
    "      SystemMessage(\n",
    "          content=\"You are a helpful assistant that answers questions accurately.\"\n",
    "      ),\n",
    "      HumanMessage(content=prompt),\n",
    "  ]\n",
    "\n",
    "  print(\"Model response:\")\n",
    "  output = llm.invoke(messages)\n",
    "  display_markdown(output.content, raw=True)\n",
    "\n",
    "# Initialize the OpenAI LLM with your API key\n",
    "llm = ChatOpenAI(\n",
    "  model=MODEL,\n",
    "  temperature=TEMPERATURE,\n",
    "  n= 1,\n",
    "  max_tokens= 256)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "DB0IAW8vBJLV"
   },
   "source": [
    "## Text Generation Patterns\n",
    "\n",
    "For simple text generation, you will see several different prompting patterns. These patterns vary depending on the amount of information you provide the LLM. The patterns we will examine in this module are listed here.\n",
    "\n",
    "* Zero-Shot\n",
    "* One-Shot\n",
    "* Few-Shot"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "9GeVTcwLl4xi"
   },
   "source": [
    "\n",
    "## Zero-Shot Text Generation\n",
    "\n",
    "A zero-shot prompt for text generation is a method where you provide a language model with a single prompt to generate text, without any prior fine-tuning or specific training on related tasks. To use this approach effectively, you should craft a detailed and clear prompt that communicates exactly what you want the model to generate. Include the type of content, style, and any specific information or constraints that are important to the task. For instance, if you're asking for a business email, you might specify the tone (formal or informal), the main points to cover (meeting time, purpose, attendees), and any call to action. The key is to be explicit about the desired output to guide the model's response accurately, as it relies solely on the information provided in the prompt to produce relevant and coherent text. This method is highly versatile and can be applied across various text generation tasks without the need for customized training.\n",
    "\n",
    "The following text is an example of a zero-shot prompt. I make many requests and provide information about the student, but I do not give the LLM a sample to work from."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 274
    },
    "id": "bt_Ra8TOw1SP",
    "outputId": "a79c23ab-56c2-4629-9d04-95d3f81477a1"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Model response:\n"
     ]
    },
    {
     "data": {
      "text/markdown": [
       "I am pleased to write this letter of recommendation for John Smith, who was a student in my INFO 558: Applications of Deep Neural Networks course at Washington University. John stood out as an exceptional student during the spring semester of 2020, demonstrating not only a deep understanding of the course material but also a remarkable enthusiasm for the subject matter.\n",
       "\n",
       "John's performance in my class was exemplary, culminating in an A+ grade, which reflects his dedication and intellectual curiosity. He consistently engaged with complex concepts and contributed thoughtfully to class discussions. His ability to grasp intricate topics in deep learning and neural networks was impressive, and he often went above and beyond the coursework to explore additional resources and applications.\n",
       "\n",
       "Since graduating with a 3.99 GPA in Quantitative Finance, John has excelled in his professional role as a Senior Financial Risk Analyst at RGA. His work in developing automation tools and programming for strategic analysis showcases his strong analytical skills and technical proficiency. It is evident that he has a passion for leveraging technology to solve real-world problems, which aligns perfectly with his decision to pursue a Master of Science in Computer Science.\n",
       "\n",
       "John's commitment to continuous learning and self-improvement is commendable. He has demonstrated a proactive approach to enhancing his programming skills while balancing his professional responsibilities. I have no"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "None\n"
     ]
    }
   ],
   "source": [
    "print(get_response(llm, \"\"\"\n",
    "Generate a positive letter of reccomendation for John Smith, a student of mine\n",
    "for INFO 558 at Washington University, my name is Jeff Heaton. He is applying\n",
    "for a Master of Science in Computer Science. Just give me the\n",
    "body text of the letter, no header or footer. Format in markdown.\n",
    "Below is his request.\n",
    "\n",
    "I hope this message finds you well and that you are enjoying the holiday season!\n",
    "I am John Smith (ID: 1234), a proud alumnus of WashU, having graduated in\n",
    "January 2021 with a Master’s degree in Quantitative Finance.\n",
    "\n",
    "During the spring semester of 2020, I had the pleasure of attending your course,\n",
    "INFO 558: Applications of Deep Neural Networks, which was an elective for my\n",
    "master's program. I thoroughly enjoyed the content and was deeply engaged\n",
    "throughout, culminating in an A+ grade.\n",
    "\n",
    "Since graduating with a 3.99 GPA—top of my major—I have been working as a Senior\n",
    "Financial Risk Analyst at RGA. My role primarily involves developing automation\n",
    "tools and programming for strategic analysis and other analytical tasks. To\n",
    "further enhance my programming skills and knowledge, I am planning to pursue a\n",
    "part-time Master's in Computer Science while continuing to work at RGA.\n",
    "\n",
    "I am a great admirer of your work (I’m a regular viewer of your YouTube channel\n",
    "and have recommended it to my colleagues), and your insights would be invaluable\n",
    "in my application. I am applying to the following programs:\n",
    "\n",
    "Georgia Tech, Master of Science in Computer Science\n",
    "University of Pennsylvania, Master of Computer & Information Technology\n",
    "Could I possibly ask for your support with a recommendation letter for these\n",
    "applications? I have attached my resume for your reference and am happy to\n",
    "provide any additional information you might need.\n",
    "\n",
    "Thank you very much for considering my request. I look forward to your\n",
    "positive response.\n",
    "\n",
    "Warm regards,\n",
    "\n",
    "John\n",
    "\"\"\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "XhpXOo-xOair"
   },
   "source": [
    "## One-Shot Text Generation\n",
    "\n",
    "A one-shot prompt for text generation is a technique where you provide a single, detailed input to a language model to generate text based on that prompt. To use this effectively, start by crafting a clear and concise prompt that includes all necessary details and context needed for the output you desire. Specify the style, tone, and specific elements you want to include. For example, if you want a descriptive paragraph about a seaside town, mention key details like the time of day, the atmosphere, and any particular imagery or emotions you want to evoke. This precision helps the model understand your expectations and produce more relevant and focused content. Once you've prepared your prompt, simply input it into the text generation tool and evaluate the generated text, tweaking your prompt as needed to refine the results."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 274
    },
    "id": "Jc5ICsAUOdm0",
    "outputId": "934de759-7150-480e-eeb7-c928105cedab"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Model response:\n"
     ]
    },
    {
     "data": {
      "text/markdown": [
       "I am pleased to write this letter of recommendation for John Smith, who was a student in my INFO 558: Applications of Deep Neural Networks course during the Spring semester of 2020 at Washington University. John excelled in my class, earning an A+ and demonstrating a remarkable aptitude for the subject matter.\n",
       "\n",
       "From the very beginning of the course, John exhibited a strong enthusiasm for deep learning and artificial intelligence. His ability to grasp complex concepts and apply them effectively was evident in his coursework and projects. Despite not having a formal computer science background, John quickly became proficient in Python programming, showcasing his determination and ability to learn rapidly. His projects were not only technically sound but also innovative, reflecting his analytical mindset and creativity.\n",
       "\n",
       "In addition to his academic performance, John actively participated in class discussions, often bringing unique perspectives that enriched the learning experience for his peers. His inquisitive nature and willingness to engage with challenging topics set him apart as a standout student. It was clear to me that John possesses a genuine passion for technology and a desire to deepen his knowledge in the field of computer science.\n",
       "\n",
       "Since graduating with a Master’s degree in Quantitative Finance and achieving an impressive 3.99 GPA, John has been working as a Senior Financial Risk Analyst at RGA. In this role,"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "None\n"
     ]
    }
   ],
   "source": [
    "print(get_response(llm, \"\"\"\n",
    "Generate a positive letter of reccomendation for John Smith, a student of mine\n",
    "for INFO 558 at Washington University, my name is Jeff Heaton. He is applying\n",
    "for a Master of Science in Computer Science. Just give me the\n",
    "body text of the letter, no header or footer. Format in markdown.\n",
    "\n",
    "-----------------\n",
    "This is an example letter of reccomendation, written by me.\n",
    "\n",
    "To Whom It May Concern:\n",
    "John earned an A+ in my course Applications of Deep Neural Networks for the\n",
    "Fall 2019 semester at Washington University in St. Louis. During the semester\n",
    "I got a chance to know John through several discussions, both about my course\n",
    "and his research interests. While John did not come from a computer science\n",
    "background he has demonstrated himself as a capable Python programmer and was\n",
    "able to express his ideas in code.  My primary career is as a VP of data science\n",
    "at RGA, a Fortune 500 insurance company.  In this role I know the value of\n",
    "individuals, such as John, who have a background in finance, understand\n",
    "advanced machine learning topics, and can code sufficiently well to function\n",
    "as a data scientist.\n",
    "\n",
    "-----------\n",
    "The details of this student's request follows.\n",
    "\n",
    "I hope this message finds you well and that you are enjoying the holiday season!\n",
    "I am John Smith (ID: 1234), a proud alumnus of WashU, having graduated in\n",
    "January 2021 with a Master’s degree in Quantitative Finance.\n",
    "\n",
    "During the spring semester of 2020, I had the pleasure of attending your course,\n",
    "INFO 558: Applications of Deep Neural Networks, which was an elective for my\n",
    "master's program. I thoroughly enjoyed the content and was deeply engaged\n",
    "throughout, culminating in an A+ grade.\n",
    "\n",
    "Since graduating with a 3.99 GPA—top of my major—I have been working as a Senior\n",
    "Financial Risk Analyst at RGA. My role primarily involves developing automation\n",
    "tools and programming for strategic analysis and other analytical tasks. To\n",
    "further enhance my programming skills and knowledge, I am planning to pursue a\n",
    "part-time Master's in Computer Science while continuing to work at RGA.\n",
    "\n",
    "I am a great admirer of your work (I’m a regular viewer of your YouTube channel\n",
    "and have recommended it to my colleagues), and your insights would be invaluable\n",
    "in my application. I am applying to the following programs:\n",
    "\n",
    "Georgia Tech, Master of Science in Computer Science\n",
    "University of Pennsylvania, Master of Computer & Information Technology\n",
    "Could I possibly ask for your support with a recommendation letter for these\n",
    "applications? I have attached my resume for your reference and am happy to\n",
    "provide any additional information you might need.\n",
    "\n",
    "Thank you very much for considering my request. I look forward to your\n",
    "positive response.\n",
    "\n",
    "Warm regards,\n",
    "\n",
    "John\n",
    "\"\"\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "DYr9PH7ZOeH3"
   },
   "source": [
    "## Few-Shot Text Generation\n",
    "\n",
    "A few-shot prompt involves presenting a model with a small set of examples to guide its behavior in generating responses or predictions. This technique is particularly useful in machine learning models like language or image generation systems, where the prompt acts as a mini-training session, enabling the model to understand and replicate a desired pattern or style with limited input. For instance, in a text generation model, a few-shot prompt might include a handful of sentences along with the desired outputs, setting the stage for the model to continue producing similar results. This approach helps in refining the model's outputs without the need for extensive training data, making it adaptable and efficient for specific tasks or creative nuances."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 291
    },
    "id": "uaYfdi_cOkS3",
    "outputId": "25e4ed06-3cf0-4d0f-cc81-d64614876c1f"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Model response:\n"
     ]
    },
    {
     "data": {
      "text/markdown": [
       "I am pleased to write this letter of recommendation for John Smith, who was a student in my INFO 558: Applications of Deep Neural Networks course during the Spring 2020 semester at Washington University. John excelled in this technically demanding course, earning an A+ and demonstrating a strong grasp of complex concepts in deep learning.\n",
       "\n",
       "From the outset, John exhibited a remarkable enthusiasm for the subject matter. His engagement in class discussions and his ability to tackle challenging programming assignments set him apart from his peers. Despite not having a traditional computer science background, John quickly adapted and showcased his programming skills, particularly in Python, which is essential for implementing deep neural networks. His ability to translate theoretical concepts into practical applications was impressive and indicative of his analytical mindset.\n",
       "\n",
       "In addition to his academic achievements, John has a solid professional background as a Senior Financial Risk Analyst at RGA, where he has been developing automation tools and programming for strategic analysis. This experience not only highlights his technical capabilities but also his ability to apply his knowledge in real-world scenarios, making him an excellent candidate for further studies in computer science.\n",
       "\n",
       "John's dedication to continuous learning and his passion for technology are evident in his decision to pursue a part-time Master's in Computer Science while maintaining his professional responsibilities. I have no doubt that he will bring"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "None\n"
     ]
    }
   ],
   "source": [
    "print(get_response(llm, \"\"\"\n",
    "Generate a positive letter of reccomendation for John Smith, a student of mine\n",
    "for INFO 558 at Washington University, my name is Jeff Heaton. He is applying\n",
    "for a Master of Science in Computer Science. Just give me the\n",
    "body text of the letter, no header or footer. Format in markdown.\n",
    "\n",
    "-----------------\n",
    "Examples of letters of reccomendation, written by me.\n",
    "\n",
    "To Whom It May Concern:\n",
    "John earned an A+ in my course Applications of Deep Neural Networks for the\n",
    "Fall 2019 semester at Washington University in St. Louis. During the semester\n",
    "I got a chance to know John through several discussions, both about my course\n",
    "and his research interests. While John did not come from a computer science\n",
    "background he has demonstrated himself as a capable Python programmer and was\n",
    "able to express his ideas in code.  My primary career is as a VP of data science\n",
    "at RGA, a Fortune 500 insurance company.  In this role I know the value of\n",
    "individuals, such as John, who have a background in finance, understand\n",
    "advanced machine learning topics, and can code sufficiently well to function\n",
    "as a data scientist.\n",
    "\n",
    "John was a student that in my class, T81-558: Application of Deep Neural Networks,\n",
    "for the Spring 2017 semester. This is a technical graduate class which includes\n",
    "students from the Masters of Science lnformation Systems, Management,\n",
    "computer science, and other disciplines. The course teaches students to\n",
    "implement deep neural networks using Google TensorFlow and Keras in the Python\n",
    "programming language. Students are expected to complete four computer programs\n",
    "and complete a final project. John did well in my course and earned an A+ (4.0).\n",
    "\n",
    "-----------\n",
    "The details of this student's request follows.\n",
    "\n",
    "I hope this message finds you well and that you are enjoying the holiday season!\n",
    "I am John Smith (ID: 1234), a proud alumnus of WashU, having graduated in\n",
    "January 2021 with a Master’s degree in Quantitative Finance.\n",
    "\n",
    "During the spring semester of 2020, I had the pleasure of attending your course,\n",
    "INFO 558: Applications of Deep Neural Networks, which was an elective for my\n",
    "master's program. I thoroughly enjoyed the content and was deeply engaged\n",
    "throughout, culminating in an A+ grade.\n",
    "\n",
    "Since graduating with a 3.99 GPA—top of my major—I have been working as a Senior\n",
    "Financial Risk Analyst at RGA. My role primarily involves developing automation\n",
    "tools and programming for strategic analysis and other analytical tasks. To\n",
    "further enhance my programming skills and knowledge, I am planning to pursue a\n",
    "part-time Master's in Computer Science while continuing to work at RGA.\n",
    "\n",
    "I am a great admirer of your work (I’m a regular viewer of your YouTube channel\n",
    "and have recommended it to my colleagues), and your insights would be invaluable\n",
    "in my application. I am applying to the following programs:\n",
    "\n",
    "Georgia Tech, Master of Science in Computer Science\n",
    "University of Pennsylvania, Master of Computer & Information Technology\n",
    "Could I possibly ask for your support with a recommendation letter for these\n",
    "applications? I have attached my resume for your reference and am happy to\n",
    "provide any additional information you might need.\n",
    "\n",
    "Thank you very much for considering my request. I look forward to your\n",
    "positive response.\n",
    "\n",
    "Warm regards,\n",
    "\n",
    "John\n",
    "\"\"\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "oFDE3DRgHvq1"
   },
   "source": [
    "## Generating Synthetic Data\n",
    "\n",
    "\n",
    "LLMs (Large Language Models) can be leveraged to generate synthetic data, which is particularly valuable for testing various systems, including those requiring personal information or demographic diversity. For example, LLMs can create detailed biographies for random careers, providing realistic and varied data for simulations, testing algorithms, or training AI models. In this instance, synthetic biographies could be generated for a software engineer, a pediatric nurse, a financial analyst, a high school science teacher, and a marketing manager, each featuring unique backgrounds and career trajectories to ensure robust and comprehensive testing scenarios."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "SRoUOR8eIi07"
   },
   "outputs": [],
   "source": [
    "from langchain_core.messages import HumanMessage, SystemMessage\n",
    "from langchain_core.prompts.chat import (\n",
    "    ChatPromptTemplate,\n",
    "    HumanMessagePromptTemplate,\n",
    "    SystemMessagePromptTemplate,\n",
    ")\n",
    "from langchain_openai import ChatOpenAI\n",
    "from IPython.display import display_markdown\n",
    "\n",
    "MODEL = 'gpt-4o-mini'\n",
    "TEMPERATURE = 0.2\n",
    "\n",
    "def get_response(llm, prompt):\n",
    "  messages = [\n",
    "      SystemMessage(\n",
    "          content=\"\"\"\n",
    "          You are a helpful assistant that generates synthetic data for a person in the career\n",
    "          field you are given. Provide a short bio for the person, not longer than\n",
    "          5 sentences. No markdown. Do not mention the job title specifically.\"\"\"\n",
    "      ),\n",
    "      HumanMessage(content=prompt),\n",
    "  ]\n",
    "\n",
    "  response = llm.invoke(messages)\n",
    "  return response.content\n",
    "\n",
    "# Initialize the OpenAI LLM with your API key\n",
    "llm = ChatOpenAI(\n",
    "  model=MODEL,\n",
    "  temperature=TEMPERATURE,\n",
    "  n= 1,\n",
    "  max_tokens= 256)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "bdXcz32RODJs"
   },
   "source": [
    "We begin by creating 5 career types we will generate data for."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "U25feB-bIHSY"
   },
   "outputs": [],
   "source": [
    "CAREER = [\n",
    "    \"software engineer\",\n",
    "    \"pediatric nurse\",\n",
    "    \"financial analyst\",\n",
    "    \"high school science teacher\",\n",
    "    \"marketing manager\"\n",
    "]\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "RGgajvk9OMy5"
   },
   "source": [
    "We begin by generating a random bio."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "ebMRli43IxVD",
    "outputId": "9b612be2-2965-467a-922b-2ab7ac4ccf04"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Alex is a passionate technology enthusiast with a strong background in computer science. With over five years of experience in developing innovative software solutions, Alex has worked on a variety of projects ranging from mobile applications to complex web platforms. They thrive in collaborative environments and enjoy tackling challenging problems with creative approaches. In addition to coding, Alex is dedicated to mentoring junior developers and sharing knowledge within the tech community. Outside of work, they enjoy contributing to open-source projects and exploring the latest advancements in artificial intelligence.\n"
     ]
    }
   ],
   "source": [
    "print(get_response(llm, \"software engineer\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "qjLdt7kMOP-Q"
   },
   "source": [
    "Now we generate a CSV file full of these random bios."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "kh4UDM16Jxq7",
    "outputId": "4bc289e2-8c91-4807-ad21-6ba2cc472e8c"
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Generating Careers: 100%|██████████| 50/50 [01:52<00:00,  2.24s/it]\n"
     ]
    }
   ],
   "source": [
    "import csv\n",
    "import random\n",
    "from tqdm import tqdm  # Progress bar library\n",
    "\n",
    "FILENAME = \"jobs.csv\"\n",
    "\n",
    "# Writing to the CSV file\n",
    "with open(FILENAME, 'w', newline='\\n') as csvfile:\n",
    "    csvwriter = csv.writer(csvfile)\n",
    "\n",
    "    # Use tqdm to show progress bar\n",
    "    for i in tqdm(range(50), desc=\"Generating Careers\"):\n",
    "      career_choice = random.choice(CAREER)  # Randomly select a career\n",
    "      csvwriter.writerow([i+1, get_response(llm, career_choice)])\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "uS4bGv9yOUSH"
   },
   "source": [
    "You can see the generated data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "H-LFTqX3KhFl",
    "outputId": "8f1972f9-1ddd-4995-ccb2-c500342d3f5f"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1,\"Dr. Emily Carter is a dedicated healthcare professional with over a decade of experience in patient care. She graduated with honors from a prestigious medical school and completed her residency at a leading hospital. Known for her compassionate approach, she specializes in treating chronic illnesses and emphasizes preventive care. Outside of her practice, Emily is actively involved in community health initiatives and enjoys mentoring aspiring medical students. In her free time, she loves hiking and exploring new culinary experiences.\"\n",
      "2,\"Born in a small town in Texas, she developed a fascination with space at an early age, often gazing at the stars through her childhood telescope. After earning a degree in aerospace engineering, she joined a prestigious research program that focused on developing new technologies for space exploration. Her dedication and innovative spirit led her to participate in several high-profile missions, where she conducted experiments in microgravity and contributed to international collaborations. Outside of her professional life, she is an advocate for STEM education, inspiring young girls to pursue careers in science and technology. With numerous accolades to her name, she continues to push the boundaries of human exploration beyond our planet.\"\n",
      "3,\"Alex is a passionate technology enthusiast with a knack for problem-solving and innovation. With a degree in computer science, they have spent several years developing applications that enhance user experience and streamline processes. Alex enjoys collaborating with cross-functional teams to bring ideas to life and is always eager to learn new programming languages and frameworks. In their free time, they contribute to open-source projects and mentor aspiring developers. Outside of coding, Alex loves hiking and exploring the great outdoors.\"\n",
      "4,\"Born and raised in a small town, she developed a fascination with the stars and space exploration from a young age. After earning a degree in aerospace engineering, she dedicated her life to pushing the boundaries of human knowledge and experience beyond Earth. With numerous missions under her belt, she has conducted groundbreaking research in microgravity and contributed to international collaborations in space. Passionate about inspiring the next generation, she frequently speaks at schools and events, sharing her journey and the importance of STEM education. In her free time, she enjoys hiking, photography, and stargazing with her family.\"\n",
      "5,\"Dr. Emily Carter is a dedicated healthcare professional with over a decade of experience in patient care. She graduated with honors from a prestigious medical school and completed her residency in a renowned hospital. Passionate about community health, she actively participates in outreach programs to educate underserved populations about preventive care. In her free time, she enjoys hiking and volunteering at local shelters. Emily believes in the importance of empathy and communication in fostering strong patient relationships.\"\n"
     ]
    }
   ],
   "source": [
    "with open(FILENAME, 'r') as file:\n",
    "    for _ in range(5):\n",
    "        line = file.readline()\n",
    "        if line:\n",
    "            print(line.strip())\n",
    "        else:\n",
    "            break"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "auMHm2SZOWgb"
   },
   "source": [
    "We can download the generated CSV."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 17
    },
    "id": "IQLZF9qHNZKg",
    "outputId": "0bfbd718-09e9-403d-d10a-41ba48122c62"
   },
   "outputs": [
    {
     "data": {
      "application/javascript": [
       "\n",
       "    async function download(id, filename, size) {\n",
       "      if (!google.colab.kernel.accessAllowed) {\n",
       "        return;\n",
       "      }\n",
       "      const div = document.createElement('div');\n",
       "      const label = document.createElement('label');\n",
       "      label.textContent = `Downloading \"${filename}\": `;\n",
       "      div.appendChild(label);\n",
       "      const progress = document.createElement('progress');\n",
       "      progress.max = size;\n",
       "      div.appendChild(progress);\n",
       "      document.body.appendChild(div);\n",
       "\n",
       "      const buffers = [];\n",
       "      let downloaded = 0;\n",
       "\n",
       "      const channel = await google.colab.kernel.comms.open(id);\n",
       "      // Send a message to notify the kernel that we're ready.\n",
       "      channel.send({})\n",
       "\n",
       "      for await (const message of channel.messages) {\n",
       "        // Send a message to notify the kernel that we're ready.\n",
       "        channel.send({})\n",
       "        if (message.buffers) {\n",
       "          for (const buffer of message.buffers) {\n",
       "            buffers.push(buffer);\n",
       "            downloaded += buffer.byteLength;\n",
       "            progress.value = downloaded;\n",
       "          }\n",
       "        }\n",
       "      }\n",
       "      const blob = new Blob(buffers, {type: 'application/binary'});\n",
       "      const a = document.createElement('a');\n",
       "      a.href = window.URL.createObjectURL(blob);\n",
       "      a.download = filename;\n",
       "      div.appendChild(a);\n",
       "      a.click();\n",
       "      div.remove();\n",
       "    }\n",
       "  "
      ],
      "text/plain": [
       "<IPython.core.display.Javascript object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/javascript": [
       "download(\"download_51839c87-5cac-4eda-9c51-587278285099\", \"jobs.csv\", 30961)"
      ],
      "text/plain": [
       "<IPython.core.display.Javascript object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "from google.colab import files\n",
    "\n",
    "files.download(FILENAME)\n"
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "colab": {
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3.11 (genai)",
   "language": "python",
   "name": "genai"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.8"
  },
  "varInspector": {
   "cols": {
    "lenName": 16,
    "lenType": 16,
    "lenVar": 40
   },
   "kernels_config": {
    "python": {
     "delete_cmd_postfix": "",
     "delete_cmd_prefix": "del ",
     "library": "var_list.py",
     "varRefreshCmd": "print(var_dic_list())"
    },
    "r": {
     "delete_cmd_postfix": ") ",
     "delete_cmd_prefix": "rm(",
     "library": "var_list.r",
     "varRefreshCmd": "cat(var_dic_list()) "
    }
   },
   "types_to_exclude": [
    "module",
    "function",
    "builtin_function_or_method",
    "instance",
    "_Feature"
   ],
   "window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
